Содержание

  • 1  Подготовка данных
  • 2  Анализ данных
      • 2.0.1  Посмотрим, как меняется концентрация металлов (Au, Ag, Pb) на различных этапах очистки. Опишем выводы.
  • 3  Чек-лист готовности проекта
  • 4  - [x] Jupyter Notebook открыт

Восстановление золота из руды¶

Подготовьте прототип модели машинного обучения для «Цифры». Компания разрабатывает решения для эффективной работы промышленных предприятий.

Модель должна предсказать коэффициент восстановления золота из золотосодержащей руды. Используйте данные с параметрами добычи и очистки.

Модель поможет оптимизировать производство, чтобы не запускать предприятие с убыточными характеристиками.

Вам нужно:

  1. Подготовить данные;
  2. Провести исследовательский анализ данных;
  3. Построить и обучить модель.

Чтобы выполнить проект, обращайтесь к библиотекам pandas, matplotlib и sklearn. Вам поможет их документация.

Подготовка данных¶

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import time
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from sklearn.impute import KNNImputer
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import  make_pipeline
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

from sklearn.metrics import make_scorer, mean_absolute_error
In [2]:
# снимаем ограничение на количество столбцов
pd.set_option('display.max_columns', None)

# снимаем ограничение на ширину столбцов
#pd.set_option('display.max_colwidth', None)

# игнорируем предупреждения
pd.set_option('chained_assignment', None)
pd.set_option('display.max_rows', None)

# чтобы предупреждение об "ошибке" sklearn и pandas не появлялось
pd.options.mode.chained_assignment = None

# выставляем ограничение на показ знаков после запятой
pd.options.display.float_format = '{:,.2f}'.format

# устанавливаем стиль графиков
sns.set_style('darkgrid')
sns.set(rc={'figure.dpi':200, 'savefig.dpi':300})   
sns.set_context('notebook')    
sns.set_style('ticks')   
In [3]:
try:
    gold_train = pd.read_csv('/datasets/gold_recovery_train_new.csv')
    gold_test = pd.read_csv('/datasets/gold_recovery_test_new.csv')
    gold_full = pd.read_csv('/datasets/gold_recovery_full_new.csv')
except:
    gold_train = pd.read_csv('gold_recovery_train_new.csv')
    gold_test = pd.read_csv('gold_recovery_test_new.csv')
    gold_full = pd.read_csv('gold_recovery_full_new.csv')
In [4]:
gold_train = gold_train.replace(float("-inf"),np.nan)
gold_test = gold_test.replace(float("-inf"),np.nan)
In [5]:
# посмотрим на размеры датасетов
gold_train.shape, gold_test.shape, gold_full.shape
Out[5]:
((14149, 87), (5290, 53), (19439, 87))
In [6]:
# Проверим явные дубликаты 
gold_train.duplicated().sum(), gold_test.duplicated().sum(), gold_full.duplicated().sum()
Out[6]:
(0, 0, 0)
In [7]:
def percentage_missing_values(a, b, c, limit=100):
    df = [(a, 'gold_train'), (b, 'gold_test'), (c, 'gold_full')]
    for i in df:
        total_count = np.product(i[0].shape)
        missing_count = sum(i[0].isna().sum())
        missing_percentage = (missing_count / total_count) * 100
        print(f'Процент пропущенных значений в датафрейме {i[1]}: {missing_percentage:.2f}%')
        
        if missing_percentage > limit:
            print(f'Внимание! Превышен лимит пропущенных значений ({limit}%).')
            
percentage_missing_values(gold_train, gold_test, gold_full)
Процент пропущенных значений в датафрейме gold_train: 0.33%
Процент пропущенных значений в датафрейме gold_test: 0.03%
Процент пропущенных значений в датафрейме gold_full: 0.26%

Проведем разведочный анализ по каждому датасету отдельно

Начнем с датасета gold_train

In [8]:
gold_train.head()
Out[8]:
date final.output.concentrate_ag final.output.concentrate_pb final.output.concentrate_sol final.output.concentrate_au final.output.recovery final.output.tail_ag final.output.tail_pb final.output.tail_sol final.output.tail_au primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.output.concentrate_ag primary_cleaner.output.concentrate_pb primary_cleaner.output.concentrate_sol primary_cleaner.output.concentrate_au primary_cleaner.output.tail_ag primary_cleaner.output.tail_pb primary_cleaner.output.tail_sol primary_cleaner.output.tail_au primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.calculation.sulfate_to_au_concentrate rougher.calculation.floatbank10_sulfate_to_au_feed rougher.calculation.floatbank11_sulfate_to_au_feed rougher.calculation.au_pb_ratio rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.output.concentrate_ag rougher.output.concentrate_pb rougher.output.concentrate_sol rougher.output.concentrate_au rougher.output.recovery rougher.output.tail_ag rougher.output.tail_pb rougher.output.tail_sol rougher.output.tail_au rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.output.tail_ag secondary_cleaner.output.tail_pb secondary_cleaner.output.tail_sol secondary_cleaner.output.tail_au secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
0 2016-01-15 00:00:00 6.06 9.89 5.51 42.19 70.54 10.41 0.90 16.90 2.14 127.09 10.13 7.25 0.99 8.55 10.39 19.53 34.17 14.94 2.53 7.48 2.11 1,549.78 -498.91 1,551.43 -516.40 1,549.87 -498.67 1,554.37 -493.43 41,885.71 3,481.78 3,520.34 2.84 6.10 2.28 523.55 55.49 36.81 6.49 11.99 6.01 11.84 6.01 11.50 7.10 28.03 19.79 87.11 5.01 0.51 19.15 1.17 999.71 -404.07 1,603.01 -434.72 1,602.38 -442.20 1,598.94 -451.29 1,404.47 -455.46 1,416.35 -451.94 14.50 4.69 8.76 2.61 25.85 -498.53 23.89 -501.41 23.96 -495.26 21.94 -499.34 14.02 -502.49 12.10 -504.72 9.93 -498.31 8.08 -500.47 14.15 -605.84
1 2016-01-15 01:00:00 6.03 9.97 5.26 42.70 69.27 10.46 0.93 16.63 2.22 125.63 10.30 7.25 1.00 8.56 10.50 19.37 34.12 16.25 3.05 6.73 2.35 1,576.17 -500.90 1,575.95 -499.87 1,575.99 -499.32 1,574.48 -498.93 42,050.86 3,498.37 3,489.98 2.86 6.16 2.27 525.29 57.28 35.75 6.48 11.97 6.01 12.00 6.01 11.62 7.28 28.07 20.05 86.84 4.96 0.54 18.97 1.18 1,000.29 -400.07 1,600.75 -449.95 1,600.48 -449.83 1,600.53 -449.95 1,399.23 -450.87 1,399.72 -450.12 14.27 4.59 9.00 2.49 25.88 -499.99 23.89 -500.37 23.97 -500.09 22.09 -499.45 13.99 -505.50 11.95 -501.33 10.04 -500.17 7.98 -500.58 14.00 -599.79
2 2016-01-15 02:00:00 6.06 10.21 5.38 42.66 68.12 10.51 0.95 16.21 2.26 123.82 11.32 7.25 0.99 8.60 10.35 19.17 33.97 16.49 3.12 6.47 2.42 1,601.56 -500.00 1,600.39 -500.61 1,602.00 -500.87 1,599.54 -499.83 42,018.10 3,495.35 3,502.36 2.95 6.12 2.16 530.03 57.51 35.97 6.36 11.92 6.20 11.92 6.20 11.70 7.22 27.45 19.74 86.84 4.84 0.55 18.81 1.16 999.72 -400.07 1,599.34 -450.01 1,599.67 -449.95 1,599.85 -449.95 1,399.18 -449.94 1,400.32 -450.53 14.12 4.62 8.84 2.46 26.01 -499.93 23.89 -499.95 23.91 -499.44 23.96 -499.90 14.02 -502.52 11.91 -501.13 10.07 -500.13 8.01 -500.52 14.03 -601.43
3 2016-01-15 03:00:00 6.05 9.98 4.86 42.69 68.35 10.42 0.88 16.53 2.15 122.27 11.32 7.25 1.00 7.22 8.50 15.98 28.26 16.02 2.96 6.84 2.26 1,599.97 -500.95 1,600.66 -499.68 1,600.30 -500.73 1,600.45 -500.05 42,029.45 3,498.58 3,499.16 3.00 6.04 2.04 542.59 57.79 36.86 6.12 11.63 6.20 11.69 6.20 11.92 7.18 27.34 19.32 87.23 4.66 0.54 19.33 1.08 999.81 -400.20 1,600.06 -450.62 1,600.01 -449.91 1,597.73 -450.13 1,400.94 -450.03 1,400.23 -449.79 13.73 4.48 9.12 2.32 25.94 -499.18 23.96 -499.85 23.97 -500.01 23.95 -499.94 14.04 -500.86 12.00 -501.19 9.97 -499.20 7.98 -500.26 14.01 -600.00
4 2016-01-15 04:00:00 6.15 10.14 4.94 42.77 66.93 10.36 0.79 16.53 2.06 117.99 11.91 7.25 1.01 9.09 9.99 19.20 33.04 16.48 3.11 6.55 2.28 1,601.34 -498.98 1,601.44 -500.32 1,599.58 -500.89 1,602.65 -500.59 42,125.35 3,494.80 3,506.68 3.17 6.06 1.79 540.53 56.05 34.35 5.66 10.96 6.20 10.96 6.19 12.41 7.24 27.04 19.22 86.69 4.55 0.52 19.27 1.01 999.68 -399.75 1,600.21 -449.60 1,600.36 -450.03 1,599.76 -449.91 1,401.56 -448.88 1,401.16 -450.41 14.08 4.47 8.87 2.33 26.02 -500.28 23.96 -500.59 23.99 -500.08 23.96 -499.99 14.03 -499.84 11.95 -501.05 9.93 -501.69 7.89 -500.36 14.00 -601.50
In [9]:
gold_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14149 entries, 0 to 14148
Data columns (total 87 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   date                                                14149 non-null  object 
 1   final.output.concentrate_ag                         14148 non-null  float64
 2   final.output.concentrate_pb                         14148 non-null  float64
 3   final.output.concentrate_sol                        13938 non-null  float64
 4   final.output.concentrate_au                         14149 non-null  float64
 5   final.output.recovery                               14149 non-null  float64
 6   final.output.tail_ag                                14149 non-null  float64
 7   final.output.tail_pb                                14049 non-null  float64
 8   final.output.tail_sol                               14144 non-null  float64
 9   final.output.tail_au                                14149 non-null  float64
 10  primary_cleaner.input.sulfate                       14129 non-null  float64
 11  primary_cleaner.input.depressant                    14117 non-null  float64
 12  primary_cleaner.input.feed_size                     14149 non-null  float64
 13  primary_cleaner.input.xanthate                      14049 non-null  float64
 14  primary_cleaner.output.concentrate_ag               14149 non-null  float64
 15  primary_cleaner.output.concentrate_pb               14063 non-null  float64
 16  primary_cleaner.output.concentrate_sol              13863 non-null  float64
 17  primary_cleaner.output.concentrate_au               14149 non-null  float64
 18  primary_cleaner.output.tail_ag                      14148 non-null  float64
 19  primary_cleaner.output.tail_pb                      14134 non-null  float64
 20  primary_cleaner.output.tail_sol                     14103 non-null  float64
 21  primary_cleaner.output.tail_au                      14149 non-null  float64
 22  primary_cleaner.state.floatbank8_a_air              14145 non-null  float64
 23  primary_cleaner.state.floatbank8_a_level            14148 non-null  float64
 24  primary_cleaner.state.floatbank8_b_air              14145 non-null  float64
 25  primary_cleaner.state.floatbank8_b_level            14148 non-null  float64
 26  primary_cleaner.state.floatbank8_c_air              14147 non-null  float64
 27  primary_cleaner.state.floatbank8_c_level            14148 non-null  float64
 28  primary_cleaner.state.floatbank8_d_air              14146 non-null  float64
 29  primary_cleaner.state.floatbank8_d_level            14148 non-null  float64
 30  rougher.calculation.sulfate_to_au_concentrate       14148 non-null  float64
 31  rougher.calculation.floatbank10_sulfate_to_au_feed  14148 non-null  float64
 32  rougher.calculation.floatbank11_sulfate_to_au_feed  14148 non-null  float64
 33  rougher.calculation.au_pb_ratio                     14149 non-null  float64
 34  rougher.input.feed_ag                               14149 non-null  float64
 35  rougher.input.feed_pb                               14049 non-null  float64
 36  rougher.input.feed_rate                             14141 non-null  float64
 37  rougher.input.feed_size                             14005 non-null  float64
 38  rougher.input.feed_sol                              14071 non-null  float64
 39  rougher.input.feed_au                               14149 non-null  float64
 40  rougher.input.floatbank10_sulfate                   14120 non-null  float64
 41  rougher.input.floatbank10_xanthate                  14141 non-null  float64
 42  rougher.input.floatbank11_sulfate                   14113 non-null  float64
 43  rougher.input.floatbank11_xanthate                  13721 non-null  float64
 44  rougher.output.concentrate_ag                       14149 non-null  float64
 45  rougher.output.concentrate_pb                       14149 non-null  float64
 46  rougher.output.concentrate_sol                      14127 non-null  float64
 47  rougher.output.concentrate_au                       14149 non-null  float64
 48  rougher.output.recovery                             14149 non-null  float64
 49  rougher.output.tail_ag                              14148 non-null  float64
 50  rougher.output.tail_pb                              14149 non-null  float64
 51  rougher.output.tail_sol                             14149 non-null  float64
 52  rougher.output.tail_au                              14149 non-null  float64
 53  rougher.state.floatbank10_a_air                     14148 non-null  float64
 54  rougher.state.floatbank10_a_level                   14148 non-null  float64
 55  rougher.state.floatbank10_b_air                     14148 non-null  float64
 56  rougher.state.floatbank10_b_level                   14148 non-null  float64
 57  rougher.state.floatbank10_c_air                     14148 non-null  float64
 58  rougher.state.floatbank10_c_level                   14148 non-null  float64
 59  rougher.state.floatbank10_d_air                     14149 non-null  float64
 60  rougher.state.floatbank10_d_level                   14149 non-null  float64
 61  rougher.state.floatbank10_e_air                     13713 non-null  float64
 62  rougher.state.floatbank10_e_level                   14149 non-null  float64
 63  rougher.state.floatbank10_f_air                     14149 non-null  float64
 64  rougher.state.floatbank10_f_level                   14149 non-null  float64
 65  secondary_cleaner.output.tail_ag                    14147 non-null  float64
 66  secondary_cleaner.output.tail_pb                    14139 non-null  float64
 67  secondary_cleaner.output.tail_sol                   12544 non-null  float64
 68  secondary_cleaner.output.tail_au                    14149 non-null  float64
 69  secondary_cleaner.state.floatbank2_a_air            13932 non-null  float64
 70  secondary_cleaner.state.floatbank2_a_level          14148 non-null  float64
 71  secondary_cleaner.state.floatbank2_b_air            14128 non-null  float64
 72  secondary_cleaner.state.floatbank2_b_level          14148 non-null  float64
 73  secondary_cleaner.state.floatbank3_a_air            14145 non-null  float64
 74  secondary_cleaner.state.floatbank3_a_level          14148 non-null  float64
 75  secondary_cleaner.state.floatbank3_b_air            14148 non-null  float64
 76  secondary_cleaner.state.floatbank3_b_level          14148 non-null  float64
 77  secondary_cleaner.state.floatbank4_a_air            14143 non-null  float64
 78  secondary_cleaner.state.floatbank4_a_level          14148 non-null  float64
 79  secondary_cleaner.state.floatbank4_b_air            14148 non-null  float64
 80  secondary_cleaner.state.floatbank4_b_level          14148 non-null  float64
 81  secondary_cleaner.state.floatbank5_a_air            14148 non-null  float64
 82  secondary_cleaner.state.floatbank5_a_level          14148 non-null  float64
 83  secondary_cleaner.state.floatbank5_b_air            14148 non-null  float64
 84  secondary_cleaner.state.floatbank5_b_level          14148 non-null  float64
 85  secondary_cleaner.state.floatbank6_a_air            14147 non-null  float64
 86  secondary_cleaner.state.floatbank6_a_level          14148 non-null  float64
dtypes: float64(86), object(1)
memory usage: 9.4+ MB
In [10]:
gold_train.isna().sum()
Out[10]:
date                                                     0
final.output.concentrate_ag                              1
final.output.concentrate_pb                              1
final.output.concentrate_sol                           211
final.output.concentrate_au                              0
final.output.recovery                                    0
final.output.tail_ag                                     0
final.output.tail_pb                                   100
final.output.tail_sol                                    5
final.output.tail_au                                     0
primary_cleaner.input.sulfate                           20
primary_cleaner.input.depressant                        32
primary_cleaner.input.feed_size                          0
primary_cleaner.input.xanthate                         100
primary_cleaner.output.concentrate_ag                    0
primary_cleaner.output.concentrate_pb                   86
primary_cleaner.output.concentrate_sol                 286
primary_cleaner.output.concentrate_au                    0
primary_cleaner.output.tail_ag                           1
primary_cleaner.output.tail_pb                          15
primary_cleaner.output.tail_sol                         46
primary_cleaner.output.tail_au                           0
primary_cleaner.state.floatbank8_a_air                   4
primary_cleaner.state.floatbank8_a_level                 1
primary_cleaner.state.floatbank8_b_air                   4
primary_cleaner.state.floatbank8_b_level                 1
primary_cleaner.state.floatbank8_c_air                   2
primary_cleaner.state.floatbank8_c_level                 1
primary_cleaner.state.floatbank8_d_air                   3
primary_cleaner.state.floatbank8_d_level                 1
rougher.calculation.sulfate_to_au_concentrate            1
rougher.calculation.floatbank10_sulfate_to_au_feed       1
rougher.calculation.floatbank11_sulfate_to_au_feed       1
rougher.calculation.au_pb_ratio                          0
rougher.input.feed_ag                                    0
rougher.input.feed_pb                                  100
rougher.input.feed_rate                                  8
rougher.input.feed_size                                144
rougher.input.feed_sol                                  78
rougher.input.feed_au                                    0
rougher.input.floatbank10_sulfate                       29
rougher.input.floatbank10_xanthate                       8
rougher.input.floatbank11_sulfate                       36
rougher.input.floatbank11_xanthate                     428
rougher.output.concentrate_ag                            0
rougher.output.concentrate_pb                            0
rougher.output.concentrate_sol                          22
rougher.output.concentrate_au                            0
rougher.output.recovery                                  0
rougher.output.tail_ag                                   1
rougher.output.tail_pb                                   0
rougher.output.tail_sol                                  0
rougher.output.tail_au                                   0
rougher.state.floatbank10_a_air                          1
rougher.state.floatbank10_a_level                        1
rougher.state.floatbank10_b_air                          1
rougher.state.floatbank10_b_level                        1
rougher.state.floatbank10_c_air                          1
rougher.state.floatbank10_c_level                        1
rougher.state.floatbank10_d_air                          0
rougher.state.floatbank10_d_level                        0
rougher.state.floatbank10_e_air                        436
rougher.state.floatbank10_e_level                        0
rougher.state.floatbank10_f_air                          0
rougher.state.floatbank10_f_level                        0
secondary_cleaner.output.tail_ag                         2
secondary_cleaner.output.tail_pb                        10
secondary_cleaner.output.tail_sol                     1605
secondary_cleaner.output.tail_au                         0
secondary_cleaner.state.floatbank2_a_air               217
secondary_cleaner.state.floatbank2_a_level               1
secondary_cleaner.state.floatbank2_b_air                21
secondary_cleaner.state.floatbank2_b_level               1
secondary_cleaner.state.floatbank3_a_air                 4
secondary_cleaner.state.floatbank3_a_level               1
secondary_cleaner.state.floatbank3_b_air                 1
secondary_cleaner.state.floatbank3_b_level               1
secondary_cleaner.state.floatbank4_a_air                 6
secondary_cleaner.state.floatbank4_a_level               1
secondary_cleaner.state.floatbank4_b_air                 1
secondary_cleaner.state.floatbank4_b_level               1
secondary_cleaner.state.floatbank5_a_air                 1
secondary_cleaner.state.floatbank5_a_level               1
secondary_cleaner.state.floatbank5_b_air                 1
secondary_cleaner.state.floatbank5_b_level               1
secondary_cleaner.state.floatbank6_a_air                 2
secondary_cleaner.state.floatbank6_a_level               1
dtype: int64
In [11]:
gold_train.describe()
Out[11]:
final.output.concentrate_ag final.output.concentrate_pb final.output.concentrate_sol final.output.concentrate_au final.output.recovery final.output.tail_ag final.output.tail_pb final.output.tail_sol final.output.tail_au primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.output.concentrate_ag primary_cleaner.output.concentrate_pb primary_cleaner.output.concentrate_sol primary_cleaner.output.concentrate_au primary_cleaner.output.tail_ag primary_cleaner.output.tail_pb primary_cleaner.output.tail_sol primary_cleaner.output.tail_au primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.calculation.sulfate_to_au_concentrate rougher.calculation.floatbank10_sulfate_to_au_feed rougher.calculation.floatbank11_sulfate_to_au_feed rougher.calculation.au_pb_ratio rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.output.concentrate_ag rougher.output.concentrate_pb rougher.output.concentrate_sol rougher.output.concentrate_au rougher.output.recovery rougher.output.tail_ag rougher.output.tail_pb rougher.output.tail_sol rougher.output.tail_au rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.output.tail_ag secondary_cleaner.output.tail_pb secondary_cleaner.output.tail_sol secondary_cleaner.output.tail_au secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
count 14,148.00 14,148.00 13,938.00 14,149.00 14,149.00 14,149.00 14,049.00 14,144.00 14,149.00 14,129.00 14,117.00 14,149.00 14,049.00 14,149.00 14,063.00 13,863.00 14,149.00 14,148.00 14,134.00 14,103.00 14,149.00 14,145.00 14,148.00 14,145.00 14,148.00 14,147.00 14,148.00 14,146.00 14,148.00 14,148.00 14,148.00 14,148.00 14,149.00 14,149.00 14,049.00 14,141.00 14,005.00 14,071.00 14,149.00 14,120.00 14,141.00 14,113.00 13,721.00 14,149.00 14,149.00 14,127.00 14,149.00 14,149.00 14,148.00 14,149.00 14,149.00 14,149.00 14,148.00 14,148.00 14,148.00 14,148.00 14,148.00 14,148.00 14,149.00 14,149.00 13,713.00 14,149.00 14,149.00 14,149.00 14,147.00 14,139.00 12,544.00 14,149.00 13,932.00 14,148.00 14,128.00 14,148.00 14,145.00 14,148.00 14,148.00 14,148.00 14,143.00 14,148.00 14,148.00 14,148.00 14,148.00 14,148.00 14,148.00 14,148.00 14,147.00 14,148.00
mean 5.14 10.13 9.20 44.00 66.52 9.61 2.60 10.51 2.92 133.32 8.87 7.32 0.89 8.20 9.59 10.11 32.39 16.30 3.44 7.53 3.84 1,608.00 -488.78 1,608.61 -489.17 1,608.88 -489.61 1,542.19 -483.46 40,382.65 3,456.61 3,253.36 2.37 8.58 3.52 474.03 60.11 36.31 7.87 11.76 5.85 11.37 5.89 11.78 7.66 28.30 19.44 82.70 5.57 0.65 17.88 1.76 1,124.73 -369.46 1,320.71 -464.26 1,299.36 -465.05 1,210.34 -465.46 1,090.21 -464.92 1,035.49 -464.69 14.28 5.85 6.94 4.25 29.61 -502.22 24.91 -503.70 29.24 -478.24 22.66 -488.92 19.99 -478.70 15.49 -460.23 16.78 -483.96 13.06 -483.97 19.58 -506.80
std 1.37 1.65 2.79 4.91 10.30 2.32 0.97 3.00 0.90 39.43 3.36 0.61 0.37 2.01 2.69 4.06 5.80 3.74 1.49 2.13 1.60 128.39 35.70 131.11 33.60 134.27 35.62 278.32 47.10 380,143.62 5,772.51 6,753.29 0.87 1.90 1.07 104.04 22.42 4.96 1.92 3.28 1.10 3.74 1.12 2.73 1.86 6.10 3.77 14.48 1.04 0.26 3.43 0.71 169.31 93.95 183.16 57.40 213.40 55.90 210.43 55.77 184.61 56.60 175.05 56.65 4.48 2.86 4.16 2.39 5.80 60.28 5.99 62.84 5.64 54.66 5.00 41.93 5.66 50.74 5.26 58.84 5.83 37.89 5.77 39.21 5.76 37.08
min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 -798.64 0.01 -740.12 0.02 -799.80 0.01 -799.79 -42,235,197.37 -486.60 -264.98 -0.01 0.01 0.01 0.01 9.66 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.59 0.01 10.01 0.02 -0.04 -650.20 -0.65 -650.26 -0.04 -641.00 -0.55 -640.52 -1.97 -649.44 -2.43 -649.88 0.00 0.00 0.00 0.00 0.08 -799.61 0.00 -799.87 0.00 -799.61 0.00 -759.18 0.00 -799.71 0.00 -799.89 -0.37 -797.14 0.65 -800.01 0.20 -809.40
25% 4.21 9.30 7.48 43.28 62.55 8.00 1.91 8.81 2.37 107.01 6.04 6.96 0.61 7.11 8.33 7.49 30.86 13.87 2.42 6.33 2.87 1,595.70 -500.29 1,558.96 -500.38 1,549.87 -500.60 1,452.68 -500.46 39,994.30 2,527.09 2,512.20 2.00 7.13 2.78 420.78 48.97 34.12 6.60 9.86 5.12 9.51 5.20 10.49 6.85 26.70 18.43 79.99 4.92 0.47 15.69 1.31 999.80 -499.79 1,199.37 -500.18 1,103.10 -500.21 1,059.71 -500.36 997.18 -500.25 900.97 -500.48 12.18 3.98 3.23 3.15 25.10 -500.25 22.05 -500.27 24.99 -500.18 19.95 -500.11 14.99 -500.63 11.89 -500.15 11.08 -500.36 8.99 -500.11 14.99 -500.75
50% 4.99 10.30 8.85 44.87 67.43 9.48 2.59 10.51 2.85 133.02 8.04 7.29 0.89 8.23 9.93 9.73 33.23 15.80 3.22 7.71 3.51 1,601.82 -499.91 1,601.82 -499.94 1,601.57 -499.87 1,600.17 -499.83 43,684.31 2,975.89 2,899.81 2.25 8.16 3.42 499.45 55.37 37.02 7.65 11.69 5.95 11.38 6.00 11.75 7.76 29.26 19.95 85.30 5.72 0.63 18.02 1.75 1,001.69 -300.18 1,301.37 -499.76 1,300.21 -499.68 1,200.74 -499.47 1,050.50 -499.61 1,000.05 -499.36 15.36 5.44 7.30 3.98 30.03 -499.96 27.02 -500.01 28.02 -499.88 22.04 -499.97 20.00 -499.68 14.98 -499.39 17.93 -499.70 12.00 -499.91 19.98 -500.06
75% 5.86 11.17 10.49 46.17 72.35 11.00 3.24 11.93 3.43 159.83 11.52 7.70 1.10 9.50 11.31 13.05 35.33 18.45 4.25 8.91 4.49 1,699.72 -499.38 1,700.22 -499.39 1,700.46 -498.80 1,699.36 -498.48 47,760.41 3,716.36 3,596.53 2.66 9.92 4.23 547.33 66.08 39.42 9.07 13.61 6.60 13.50 6.70 13.43 8.60 31.74 21.39 90.17 6.31 0.79 19.94 2.19 1,299.51 -299.96 1,449.55 -400.43 1,450.35 -400.66 1,344.38 -401.05 1,200.05 -400.60 1,100.17 -401.01 17.23 7.80 10.55 4.88 34.89 -499.59 28.94 -499.76 34.99 -436.92 25.97 -499.76 24.99 -477.47 20.06 -400.04 21.35 -487.71 17.98 -453.19 24.99 -499.54
max 16.00 17.03 18.12 52.76 100.00 19.55 5.64 22.32 8.20 250.13 20.05 10.47 2.51 16.08 17.08 22.28 45.93 29.46 9.63 20.62 17.79 2,079.53 -330.13 2,114.91 -347.35 2,013.16 -346.65 2,398.90 -30.60 3,428,098.94 629,638.98 718,684.96 39.38 14.60 7.14 717.51 484.97 48.36 13.13 36.12 9.70 37.98 9.70 24.48 13.62 38.35 28.15 100.00 12.72 3.78 66.12 9.69 1,521.98 -281.04 1,809.19 -296.38 2,499.13 -292.16 1,817.20 -208.33 1,922.64 -272.20 1,706.31 -191.72 23.26 17.04 17.98 26.81 52.65 -127.88 35.15 -212.06 44.26 -191.68 35.07 -159.74 30.12 -245.24 24.01 -145.07 43.71 -275.07 27.93 -157.40 32.19 -104.43
In [12]:
%%time

# Посмотрим на распределение данных в датасете gold_train
gold_train.hist(figsize=(50, 40), bins=50, color='brown')
plt.show()
CPU times: user 18.5 s, sys: 850 ms, total: 19.3 s
Wall time: 19.4 s
In [13]:
# Отдельно визуализируем распределение целевых признаков 'rougher.output.recovery' и 'final.output.recovery' 

gold_train['rougher.output.recovery'].hist(figsize=(16, 5), alpha=0.7, bins=100, color='black', edgecolor = 'black')
gold_train['final.output.recovery'].hist(figsize=(16, 5), alpha=0.7, bins=100, color='gold', edgecolor = 'black')
plt.grid(True)
plt.legend(["Эффективность обогащения чернового концентрата 'rougher.output.recovery'", 
            "Эффективность обогащения финального концентрата 'final.output.recovery'"])
plt.xlabel('Эффективность (коэффициент)')
plt.ylabel('Количество индексаций (замеров)')
plt.title('Распределение эффективности обогащения чернового и финального концентрата в обучающей выборке')
plt.show()
In [14]:
%%time

# Построим тепловую карту (хитмэп) коэффициентов корреляции Пирсона для обучающей выборки (датасет gold_recovery_train_new):
plt.figure(figsize=(50,50))
sns.heatmap(gold_train.corr(), annot=True, fmt = '.2f', vmin=-1, vmax=1, center=0, cmap='coolwarm')
plt.show()
CPU times: user 36.8 s, sys: 7.25 s, total: 44 s
Wall time: 44.1 s

Датасет gold_test

In [15]:
gold_test.head()
Out[15]:
date primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
0 2016-09-01 00:59:59 210.80 14.99 8.08 1.01 1,398.98 -500.23 1,399.14 -499.92 1,400.10 -500.70 1,399.00 -499.49 13.13 5.64 489.79 62.71 42.02 12.08 16.92 6.15 16.87 6.15 1,001.85 -350.30 1,249.74 -399.11 1,249.75 -399.40 1,198.29 -399.49 999.47 -399.53 949.57 -398.18 24.94 -500.49 14.95 -500.01 20.02 -450.40 13.99 -449.83 12.02 -497.80 8.02 -501.29 7.95 -432.32 4.87 -500.04 26.71 -499.71
1 2016-09-01 01:59:59 215.39 14.99 8.08 0.99 1,398.78 -500.06 1,398.06 -499.78 1,396.15 -499.24 1,399.51 -500.42 13.04 5.53 490.10 61.96 41.19 11.92 17.00 6.00 17.00 6.00 998.69 -350.43 1,248.40 -399.95 1,249.51 -399.63 1,200.51 -399.94 1,000.00 -399.49 950.20 -405.79 24.92 -499.81 14.93 -500.76 19.99 -450.11 14.09 -450.06 12.06 -498.70 8.13 -499.63 7.96 -525.84 4.88 -500.16 25.02 -499.82
2 2016-09-01 02:59:59 215.26 12.88 7.79 1.00 1,398.49 -500.87 1,398.86 -499.76 1,398.08 -502.15 1,399.50 -499.72 13.14 5.43 489.62 66.90 42.55 12.09 16.99 5.85 16.98 5.85 998.52 -349.78 1,247.44 -400.26 1,248.21 -401.07 1,199.77 -400.79 999.93 -399.24 950.32 -400.86 24.91 -500.30 15.00 -500.99 20.04 -450.26 14.08 -449.66 11.96 -498.77 8.10 -500.83 8.07 -500.80 4.91 -499.83 24.99 -500.62
3 2016-09-01 03:59:59 215.34 12.01 7.64 0.86 1,399.62 -498.86 1,397.44 -499.21 1,400.13 -498.36 1,401.07 -501.04 12.40 5.11 476.62 59.87 41.06 12.18 16.53 5.80 16.52 5.80 1,000.28 -350.17 1,251.32 -398.66 1,250.49 -399.75 1,199.40 -397.50 1,001.93 -400.44 950.74 -399.80 24.89 -499.38 14.92 -499.86 20.03 -449.37 14.01 -449.53 12.03 -498.35 8.07 -499.47 7.90 -500.87 4.93 -499.96 24.95 -498.71
4 2016-09-01 04:59:59 199.10 10.68 7.53 0.81 1,401.27 -500.81 1,398.13 -499.50 1,402.17 -500.81 1,399.48 -499.37 11.33 4.77 488.25 63.32 41.27 11.29 13.61 5.74 13.65 5.74 996.54 -350.56 1,304.66 -399.51 1,306.46 -399.05 1,248.70 -400.88 1,058.84 -398.99 949.65 -399.28 24.89 -499.36 14.98 -500.19 19.96 -450.64 14.01 -450.02 12.03 -500.79 8.05 -500.40 8.11 -509.53 4.96 -500.36 25.00 -500.86
In [16]:
gold_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5290 entries, 0 to 5289
Data columns (total 53 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   date                                        5290 non-null   object 
 1   primary_cleaner.input.sulfate               5286 non-null   float64
 2   primary_cleaner.input.depressant            5285 non-null   float64
 3   primary_cleaner.input.feed_size             5290 non-null   float64
 4   primary_cleaner.input.xanthate              5286 non-null   float64
 5   primary_cleaner.state.floatbank8_a_air      5290 non-null   float64
 6   primary_cleaner.state.floatbank8_a_level    5290 non-null   float64
 7   primary_cleaner.state.floatbank8_b_air      5290 non-null   float64
 8   primary_cleaner.state.floatbank8_b_level    5290 non-null   float64
 9   primary_cleaner.state.floatbank8_c_air      5290 non-null   float64
 10  primary_cleaner.state.floatbank8_c_level    5290 non-null   float64
 11  primary_cleaner.state.floatbank8_d_air      5290 non-null   float64
 12  primary_cleaner.state.floatbank8_d_level    5290 non-null   float64
 13  rougher.input.feed_ag                       5290 non-null   float64
 14  rougher.input.feed_pb                       5290 non-null   float64
 15  rougher.input.feed_rate                     5287 non-null   float64
 16  rougher.input.feed_size                     5289 non-null   float64
 17  rougher.input.feed_sol                      5269 non-null   float64
 18  rougher.input.feed_au                       5290 non-null   float64
 19  rougher.input.floatbank10_sulfate           5285 non-null   float64
 20  rougher.input.floatbank10_xanthate          5290 non-null   float64
 21  rougher.input.floatbank11_sulfate           5282 non-null   float64
 22  rougher.input.floatbank11_xanthate          5265 non-null   float64
 23  rougher.state.floatbank10_a_air             5290 non-null   float64
 24  rougher.state.floatbank10_a_level           5290 non-null   float64
 25  rougher.state.floatbank10_b_air             5290 non-null   float64
 26  rougher.state.floatbank10_b_level           5290 non-null   float64
 27  rougher.state.floatbank10_c_air             5290 non-null   float64
 28  rougher.state.floatbank10_c_level           5290 non-null   float64
 29  rougher.state.floatbank10_d_air             5290 non-null   float64
 30  rougher.state.floatbank10_d_level           5290 non-null   float64
 31  rougher.state.floatbank10_e_air             5290 non-null   float64
 32  rougher.state.floatbank10_e_level           5290 non-null   float64
 33  rougher.state.floatbank10_f_air             5290 non-null   float64
 34  rougher.state.floatbank10_f_level           5290 non-null   float64
 35  secondary_cleaner.state.floatbank2_a_air    5287 non-null   float64
 36  secondary_cleaner.state.floatbank2_a_level  5290 non-null   float64
 37  secondary_cleaner.state.floatbank2_b_air    5288 non-null   float64
 38  secondary_cleaner.state.floatbank2_b_level  5290 non-null   float64
 39  secondary_cleaner.state.floatbank3_a_air    5281 non-null   float64
 40  secondary_cleaner.state.floatbank3_a_level  5290 non-null   float64
 41  secondary_cleaner.state.floatbank3_b_air    5290 non-null   float64
 42  secondary_cleaner.state.floatbank3_b_level  5290 non-null   float64
 43  secondary_cleaner.state.floatbank4_a_air    5290 non-null   float64
 44  secondary_cleaner.state.floatbank4_a_level  5290 non-null   float64
 45  secondary_cleaner.state.floatbank4_b_air    5290 non-null   float64
 46  secondary_cleaner.state.floatbank4_b_level  5290 non-null   float64
 47  secondary_cleaner.state.floatbank5_a_air    5290 non-null   float64
 48  secondary_cleaner.state.floatbank5_a_level  5290 non-null   float64
 49  secondary_cleaner.state.floatbank5_b_air    5290 non-null   float64
 50  secondary_cleaner.state.floatbank5_b_level  5290 non-null   float64
 51  secondary_cleaner.state.floatbank6_a_air    5290 non-null   float64
 52  secondary_cleaner.state.floatbank6_a_level  5290 non-null   float64
dtypes: float64(52), object(1)
memory usage: 2.1+ MB
In [17]:
gold_test.isna().sum()
Out[17]:
date                                           0
primary_cleaner.input.sulfate                  4
primary_cleaner.input.depressant               5
primary_cleaner.input.feed_size                0
primary_cleaner.input.xanthate                 4
primary_cleaner.state.floatbank8_a_air         0
primary_cleaner.state.floatbank8_a_level       0
primary_cleaner.state.floatbank8_b_air         0
primary_cleaner.state.floatbank8_b_level       0
primary_cleaner.state.floatbank8_c_air         0
primary_cleaner.state.floatbank8_c_level       0
primary_cleaner.state.floatbank8_d_air         0
primary_cleaner.state.floatbank8_d_level       0
rougher.input.feed_ag                          0
rougher.input.feed_pb                          0
rougher.input.feed_rate                        3
rougher.input.feed_size                        1
rougher.input.feed_sol                        21
rougher.input.feed_au                          0
rougher.input.floatbank10_sulfate              5
rougher.input.floatbank10_xanthate             0
rougher.input.floatbank11_sulfate              8
rougher.input.floatbank11_xanthate            25
rougher.state.floatbank10_a_air                0
rougher.state.floatbank10_a_level              0
rougher.state.floatbank10_b_air                0
rougher.state.floatbank10_b_level              0
rougher.state.floatbank10_c_air                0
rougher.state.floatbank10_c_level              0
rougher.state.floatbank10_d_air                0
rougher.state.floatbank10_d_level              0
rougher.state.floatbank10_e_air                0
rougher.state.floatbank10_e_level              0
rougher.state.floatbank10_f_air                0
rougher.state.floatbank10_f_level              0
secondary_cleaner.state.floatbank2_a_air       3
secondary_cleaner.state.floatbank2_a_level     0
secondary_cleaner.state.floatbank2_b_air       2
secondary_cleaner.state.floatbank2_b_level     0
secondary_cleaner.state.floatbank3_a_air       9
secondary_cleaner.state.floatbank3_a_level     0
secondary_cleaner.state.floatbank3_b_air       0
secondary_cleaner.state.floatbank3_b_level     0
secondary_cleaner.state.floatbank4_a_air       0
secondary_cleaner.state.floatbank4_a_level     0
secondary_cleaner.state.floatbank4_b_air       0
secondary_cleaner.state.floatbank4_b_level     0
secondary_cleaner.state.floatbank5_a_air       0
secondary_cleaner.state.floatbank5_a_level     0
secondary_cleaner.state.floatbank5_b_air       0
secondary_cleaner.state.floatbank5_b_level     0
secondary_cleaner.state.floatbank6_a_air       0
secondary_cleaner.state.floatbank6_a_level     0
dtype: int64
In [18]:
gold_test.describe()
Out[18]:
primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
count 5,286.00 5,285.00 5,290.00 5,286.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,287.00 5,289.00 5,269.00 5,290.00 5,285.00 5,290.00 5,282.00 5,265.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,287.00 5,290.00 5,288.00 5,290.00 5,281.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00 5,290.00
mean 174.84 8.68 7.27 1.38 1,539.49 -497.67 1,545.17 -500.27 1,527.27 -498.33 1,544.84 -500.20 9.37 3.80 489.80 55.95 37.72 9.33 13.86 6.47 13.92 6.51 1,053.40 -395.73 1,318.93 -475.24 1,301.60 -474.84 1,214.85 -475.68 1,056.65 -469.03 997.95 -470.48 26.31 -502.75 21.95 -501.99 25.40 -507.49 20.98 -506.63 16.32 -505.14 13.74 -463.35 12.80 -501.33 9.88 -495.66 17.30 -501.79
std 43.03 3.07 0.61 0.64 116.80 19.95 122.22 32.97 122.54 21.96 124.77 31.05 1.93 0.95 108.04 19.08 5.49 1.62 3.35 1.07 3.22 0.89 121.14 91.09 156.45 45.65 171.27 45.86 185.76 47.84 131.54 59.33 128.22 60.76 3.43 28.76 4.35 34.58 6.53 47.62 6.74 44.53 3.49 31.43 3.43 86.19 3.03 17.95 2.87 34.54 4.54 39.04
min 2.57 0.00 5.65 0.00 0.00 -795.32 0.00 -800.00 0.00 -799.96 0.00 -799.79 0.57 0.27 0.00 0.05 1.39 0.57 0.00 0.00 0.00 0.01 -0.04 -657.95 -0.72 -650.25 -0.06 -647.54 -0.99 -648.39 -1.98 -649.27 -2.59 -649.95 0.21 -784.09 0.01 -797.78 0.00 -799.76 0.00 -809.33 0.00 -799.80 0.00 -800.84 0.07 -797.32 0.53 -800.22 -0.08 -809.74
25% 147.12 6.49 6.89 0.91 1,498.94 -500.36 1,498.97 -500.70 1,473.23 -501.02 1,499.48 -500.45 8.11 3.24 407.02 43.91 34.51 8.21 12.00 6.00 12.00 6.00 999.21 -499.92 1,200.87 -500.26 1,199.65 -500.23 1,093.37 -500.44 999.36 -500.19 901.02 -500.62 24.94 -500.21 20.00 -500.22 22.98 -500.30 17.97 -500.15 14.04 -500.87 12.03 -500.32 10.91 -500.73 8.04 -500.19 14.00 -500.69
50% 177.83 8.05 7.25 1.20 1,585.13 -499.97 1,595.62 -500.03 1,549.59 -500.02 1,594.58 -500.02 9.76 3.74 499.05 50.84 37.98 9.59 14.00 6.50 14.00 6.50 1,000.47 -399.69 1,302.25 -499.84 1,300.20 -499.78 1,207.01 -499.69 1,047.50 -499.77 999.44 -499.68 26.91 -500.00 22.94 -500.02 25.01 -500.03 21.00 -500.01 17.01 -500.12 14.96 -499.58 12.95 -499.99 10.00 -499.99 16.01 -500.01
75% 208.13 10.03 7.60 1.80 1,602.08 -499.57 1,602.32 -499.29 1,601.14 -498.99 1,600.96 -499.61 10.65 4.28 575.31 62.43 41.64 10.46 16.97 7.09 16.96 7.09 1,006.25 -300.06 1,433.96 -450.75 1,406.59 -451.15 1,391.50 -452.48 1,101.37 -450.96 1,050.43 -451.99 28.09 -499.79 24.99 -499.83 30.00 -499.78 26.98 -499.89 18.04 -499.40 15.96 -400.93 15.10 -499.28 12.00 -499.72 21.02 -499.37
max 265.98 40.00 15.50 4.10 2,103.10 -57.20 1,813.08 -142.53 1,715.05 -150.94 1,913.26 -158.95 14.41 6.91 707.36 392.49 53.48 13.73 24.28 8.91 24.28 8.62 1,423.27 -273.78 1,706.64 -298.20 1,731.02 -298.04 1,775.22 -76.40 1,467.18 -139.75 1,476.59 -249.80 32.14 -300.34 28.17 -212.00 40.04 -313.87 32.04 -202.28 30.05 -401.57 31.27 -6.51 25.26 -244.48 14.09 -137.74 26.71 -123.31

Датасет gold_full

In [19]:
gold_full.head()
Out[19]:
date final.output.concentrate_ag final.output.concentrate_pb final.output.concentrate_sol final.output.concentrate_au final.output.recovery final.output.tail_ag final.output.tail_pb final.output.tail_sol final.output.tail_au primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.output.concentrate_ag primary_cleaner.output.concentrate_pb primary_cleaner.output.concentrate_sol primary_cleaner.output.concentrate_au primary_cleaner.output.tail_ag primary_cleaner.output.tail_pb primary_cleaner.output.tail_sol primary_cleaner.output.tail_au primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.calculation.sulfate_to_au_concentrate rougher.calculation.floatbank10_sulfate_to_au_feed rougher.calculation.floatbank11_sulfate_to_au_feed rougher.calculation.au_pb_ratio rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.output.concentrate_ag rougher.output.concentrate_pb rougher.output.concentrate_sol rougher.output.concentrate_au rougher.output.recovery rougher.output.tail_ag rougher.output.tail_pb rougher.output.tail_sol rougher.output.tail_au rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.output.tail_ag secondary_cleaner.output.tail_pb secondary_cleaner.output.tail_sol secondary_cleaner.output.tail_au secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
0 2016-01-15 00:00:00 6.06 9.89 5.51 42.19 70.54 10.41 0.90 16.90 2.14 127.09 10.13 7.25 0.99 8.55 10.39 19.53 34.17 14.94 2.53 7.48 2.11 1,549.78 -498.91 1,551.43 -516.40 1,549.87 -498.67 1,554.37 -493.43 41,885.71 3,481.78 3,520.34 2.84 6.10 2.28 523.55 55.49 36.81 6.49 11.99 6.01 11.84 6.01 11.50 7.10 28.03 19.79 87.11 5.01 0.51 19.15 1.17 999.71 -404.07 1,603.01 -434.72 1,602.38 -442.20 1,598.94 -451.29 1,404.47 -455.46 1,416.35 -451.94 14.50 4.69 8.76 2.61 25.85 -498.53 23.89 -501.41 23.96 -495.26 21.94 -499.34 14.02 -502.49 12.10 -504.72 9.93 -498.31 8.08 -500.47 14.15 -605.84
1 2016-01-15 01:00:00 6.03 9.97 5.26 42.70 69.27 10.46 0.93 16.63 2.22 125.63 10.30 7.25 1.00 8.56 10.50 19.37 34.12 16.25 3.05 6.73 2.35 1,576.17 -500.90 1,575.95 -499.87 1,575.99 -499.32 1,574.48 -498.93 42,050.86 3,498.37 3,489.98 2.86 6.16 2.27 525.29 57.28 35.75 6.48 11.97 6.01 12.00 6.01 11.62 7.28 28.07 20.05 86.84 4.96 0.54 18.97 1.18 1,000.29 -400.07 1,600.75 -449.95 1,600.48 -449.83 1,600.53 -449.95 1,399.23 -450.87 1,399.72 -450.12 14.27 4.59 9.00 2.49 25.88 -499.99 23.89 -500.37 23.97 -500.09 22.09 -499.45 13.99 -505.50 11.95 -501.33 10.04 -500.17 7.98 -500.58 14.00 -599.79
2 2016-01-15 02:00:00 6.06 10.21 5.38 42.66 68.12 10.51 0.95 16.21 2.26 123.82 11.32 7.25 0.99 8.60 10.35 19.17 33.97 16.49 3.12 6.47 2.42 1,601.56 -500.00 1,600.39 -500.61 1,602.00 -500.87 1,599.54 -499.83 42,018.10 3,495.35 3,502.36 2.95 6.12 2.16 530.03 57.51 35.97 6.36 11.92 6.20 11.92 6.20 11.70 7.22 27.45 19.74 86.84 4.84 0.55 18.81 1.16 999.72 -400.07 1,599.34 -450.01 1,599.67 -449.95 1,599.85 -449.95 1,399.18 -449.94 1,400.32 -450.53 14.12 4.62 8.84 2.46 26.01 -499.93 23.89 -499.95 23.91 -499.44 23.96 -499.90 14.02 -502.52 11.91 -501.13 10.07 -500.13 8.01 -500.52 14.03 -601.43
3 2016-01-15 03:00:00 6.05 9.98 4.86 42.69 68.35 10.42 0.88 16.53 2.15 122.27 11.32 7.25 1.00 7.22 8.50 15.98 28.26 16.02 2.96 6.84 2.26 1,599.97 -500.95 1,600.66 -499.68 1,600.30 -500.73 1,600.45 -500.05 42,029.45 3,498.58 3,499.16 3.00 6.04 2.04 542.59 57.79 36.86 6.12 11.63 6.20 11.69 6.20 11.92 7.18 27.34 19.32 87.23 4.66 0.54 19.33 1.08 999.81 -400.20 1,600.06 -450.62 1,600.01 -449.91 1,597.73 -450.13 1,400.94 -450.03 1,400.23 -449.79 13.73 4.48 9.12 2.32 25.94 -499.18 23.96 -499.85 23.97 -500.01 23.95 -499.94 14.04 -500.86 12.00 -501.19 9.97 -499.20 7.98 -500.26 14.01 -600.00
4 2016-01-15 04:00:00 6.15 10.14 4.94 42.77 66.93 10.36 0.79 16.53 2.06 117.99 11.91 7.25 1.01 9.09 9.99 19.20 33.04 16.48 3.11 6.55 2.28 1,601.34 -498.98 1,601.44 -500.32 1,599.58 -500.89 1,602.65 -500.59 42,125.35 3,494.80 3,506.68 3.17 6.06 1.79 540.53 56.05 34.35 5.66 10.96 6.20 10.96 6.19 12.41 7.24 27.04 19.22 86.69 4.55 0.52 19.27 1.01 999.68 -399.75 1,600.21 -449.60 1,600.36 -450.03 1,599.76 -449.91 1,401.56 -448.88 1,401.16 -450.41 14.08 4.47 8.87 2.33 26.02 -500.28 23.96 -500.59 23.99 -500.08 23.96 -499.99 14.03 -499.84 11.95 -501.05 9.93 -501.69 7.89 -500.36 14.00 -601.50
In [20]:
gold_full.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19439 entries, 0 to 19438
Data columns (total 87 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   date                                                19439 non-null  object 
 1   final.output.concentrate_ag                         19438 non-null  float64
 2   final.output.concentrate_pb                         19438 non-null  float64
 3   final.output.concentrate_sol                        19228 non-null  float64
 4   final.output.concentrate_au                         19439 non-null  float64
 5   final.output.recovery                               19439 non-null  float64
 6   final.output.tail_ag                                19438 non-null  float64
 7   final.output.tail_pb                                19338 non-null  float64
 8   final.output.tail_sol                               19433 non-null  float64
 9   final.output.tail_au                                19439 non-null  float64
 10  primary_cleaner.input.sulfate                       19415 non-null  float64
 11  primary_cleaner.input.depressant                    19402 non-null  float64
 12  primary_cleaner.input.feed_size                     19439 non-null  float64
 13  primary_cleaner.input.xanthate                      19335 non-null  float64
 14  primary_cleaner.output.concentrate_ag               19439 non-null  float64
 15  primary_cleaner.output.concentrate_pb               19323 non-null  float64
 16  primary_cleaner.output.concentrate_sol              19069 non-null  float64
 17  primary_cleaner.output.concentrate_au               19439 non-null  float64
 18  primary_cleaner.output.tail_ag                      19435 non-null  float64
 19  primary_cleaner.output.tail_pb                      19418 non-null  float64
 20  primary_cleaner.output.tail_sol                     19377 non-null  float64
 21  primary_cleaner.output.tail_au                      19439 non-null  float64
 22  primary_cleaner.state.floatbank8_a_air              19435 non-null  float64
 23  primary_cleaner.state.floatbank8_a_level            19438 non-null  float64
 24  primary_cleaner.state.floatbank8_b_air              19435 non-null  float64
 25  primary_cleaner.state.floatbank8_b_level            19438 non-null  float64
 26  primary_cleaner.state.floatbank8_c_air              19437 non-null  float64
 27  primary_cleaner.state.floatbank8_c_level            19438 non-null  float64
 28  primary_cleaner.state.floatbank8_d_air              19436 non-null  float64
 29  primary_cleaner.state.floatbank8_d_level            19438 non-null  float64
 30  rougher.calculation.sulfate_to_au_concentrate       19437 non-null  float64
 31  rougher.calculation.floatbank10_sulfate_to_au_feed  19437 non-null  float64
 32  rougher.calculation.floatbank11_sulfate_to_au_feed  19437 non-null  float64
 33  rougher.calculation.au_pb_ratio                     19439 non-null  float64
 34  rougher.input.feed_ag                               19439 non-null  float64
 35  rougher.input.feed_pb                               19339 non-null  float64
 36  rougher.input.feed_rate                             19428 non-null  float64
 37  rougher.input.feed_size                             19294 non-null  float64
 38  rougher.input.feed_sol                              19340 non-null  float64
 39  rougher.input.feed_au                               19439 non-null  float64
 40  rougher.input.floatbank10_sulfate                   19405 non-null  float64
 41  rougher.input.floatbank10_xanthate                  19431 non-null  float64
 42  rougher.input.floatbank11_sulfate                   19395 non-null  float64
 43  rougher.input.floatbank11_xanthate                  18986 non-null  float64
 44  rougher.output.concentrate_ag                       19439 non-null  float64
 45  rougher.output.concentrate_pb                       19439 non-null  float64
 46  rougher.output.concentrate_sol                      19416 non-null  float64
 47  rougher.output.concentrate_au                       19439 non-null  float64
 48  rougher.output.recovery                             19439 non-null  float64
 49  rougher.output.tail_ag                              19438 non-null  float64
 50  rougher.output.tail_pb                              19439 non-null  float64
 51  rougher.output.tail_sol                             19439 non-null  float64
 52  rougher.output.tail_au                              19439 non-null  float64
 53  rougher.state.floatbank10_a_air                     19438 non-null  float64
 54  rougher.state.floatbank10_a_level                   19438 non-null  float64
 55  rougher.state.floatbank10_b_air                     19438 non-null  float64
 56  rougher.state.floatbank10_b_level                   19438 non-null  float64
 57  rougher.state.floatbank10_c_air                     19438 non-null  float64
 58  rougher.state.floatbank10_c_level                   19438 non-null  float64
 59  rougher.state.floatbank10_d_air                     19439 non-null  float64
 60  rougher.state.floatbank10_d_level                   19439 non-null  float64
 61  rougher.state.floatbank10_e_air                     19003 non-null  float64
 62  rougher.state.floatbank10_e_level                   19439 non-null  float64
 63  rougher.state.floatbank10_f_air                     19439 non-null  float64
 64  rougher.state.floatbank10_f_level                   19439 non-null  float64
 65  secondary_cleaner.output.tail_ag                    19437 non-null  float64
 66  secondary_cleaner.output.tail_pb                    19427 non-null  float64
 67  secondary_cleaner.output.tail_sol                   17691 non-null  float64
 68  secondary_cleaner.output.tail_au                    19439 non-null  float64
 69  secondary_cleaner.state.floatbank2_a_air            19219 non-null  float64
 70  secondary_cleaner.state.floatbank2_a_level          19438 non-null  float64
 71  secondary_cleaner.state.floatbank2_b_air            19416 non-null  float64
 72  secondary_cleaner.state.floatbank2_b_level          19438 non-null  float64
 73  secondary_cleaner.state.floatbank3_a_air            19426 non-null  float64
 74  secondary_cleaner.state.floatbank3_a_level          19438 non-null  float64
 75  secondary_cleaner.state.floatbank3_b_air            19438 non-null  float64
 76  secondary_cleaner.state.floatbank3_b_level          19438 non-null  float64
 77  secondary_cleaner.state.floatbank4_a_air            19433 non-null  float64
 78  secondary_cleaner.state.floatbank4_a_level          19438 non-null  float64
 79  secondary_cleaner.state.floatbank4_b_air            19438 non-null  float64
 80  secondary_cleaner.state.floatbank4_b_level          19438 non-null  float64
 81  secondary_cleaner.state.floatbank5_a_air            19438 non-null  float64
 82  secondary_cleaner.state.floatbank5_a_level          19438 non-null  float64
 83  secondary_cleaner.state.floatbank5_b_air            19438 non-null  float64
 84  secondary_cleaner.state.floatbank5_b_level          19438 non-null  float64
 85  secondary_cleaner.state.floatbank6_a_air            19437 non-null  float64
 86  secondary_cleaner.state.floatbank6_a_level          19438 non-null  float64
dtypes: float64(86), object(1)
memory usage: 12.9+ MB
In [21]:
gold_full.isna().sum()
Out[21]:
date                                                     0
final.output.concentrate_ag                              1
final.output.concentrate_pb                              1
final.output.concentrate_sol                           211
final.output.concentrate_au                              0
final.output.recovery                                    0
final.output.tail_ag                                     1
final.output.tail_pb                                   101
final.output.tail_sol                                    6
final.output.tail_au                                     0
primary_cleaner.input.sulfate                           24
primary_cleaner.input.depressant                        37
primary_cleaner.input.feed_size                          0
primary_cleaner.input.xanthate                         104
primary_cleaner.output.concentrate_ag                    0
primary_cleaner.output.concentrate_pb                  116
primary_cleaner.output.concentrate_sol                 370
primary_cleaner.output.concentrate_au                    0
primary_cleaner.output.tail_ag                           4
primary_cleaner.output.tail_pb                          21
primary_cleaner.output.tail_sol                         62
primary_cleaner.output.tail_au                           0
primary_cleaner.state.floatbank8_a_air                   4
primary_cleaner.state.floatbank8_a_level                 1
primary_cleaner.state.floatbank8_b_air                   4
primary_cleaner.state.floatbank8_b_level                 1
primary_cleaner.state.floatbank8_c_air                   2
primary_cleaner.state.floatbank8_c_level                 1
primary_cleaner.state.floatbank8_d_air                   3
primary_cleaner.state.floatbank8_d_level                 1
rougher.calculation.sulfate_to_au_concentrate            2
rougher.calculation.floatbank10_sulfate_to_au_feed       2
rougher.calculation.floatbank11_sulfate_to_au_feed       2
rougher.calculation.au_pb_ratio                          0
rougher.input.feed_ag                                    0
rougher.input.feed_pb                                  100
rougher.input.feed_rate                                 11
rougher.input.feed_size                                145
rougher.input.feed_sol                                  99
rougher.input.feed_au                                    0
rougher.input.floatbank10_sulfate                       34
rougher.input.floatbank10_xanthate                       8
rougher.input.floatbank11_sulfate                       44
rougher.input.floatbank11_xanthate                     453
rougher.output.concentrate_ag                            0
rougher.output.concentrate_pb                            0
rougher.output.concentrate_sol                          23
rougher.output.concentrate_au                            0
rougher.output.recovery                                  0
rougher.output.tail_ag                                   1
rougher.output.tail_pb                                   0
rougher.output.tail_sol                                  0
rougher.output.tail_au                                   0
rougher.state.floatbank10_a_air                          1
rougher.state.floatbank10_a_level                        1
rougher.state.floatbank10_b_air                          1
rougher.state.floatbank10_b_level                        1
rougher.state.floatbank10_c_air                          1
rougher.state.floatbank10_c_level                        1
rougher.state.floatbank10_d_air                          0
rougher.state.floatbank10_d_level                        0
rougher.state.floatbank10_e_air                        436
rougher.state.floatbank10_e_level                        0
rougher.state.floatbank10_f_air                          0
rougher.state.floatbank10_f_level                        0
secondary_cleaner.output.tail_ag                         2
secondary_cleaner.output.tail_pb                        12
secondary_cleaner.output.tail_sol                     1748
secondary_cleaner.output.tail_au                         0
secondary_cleaner.state.floatbank2_a_air               220
secondary_cleaner.state.floatbank2_a_level               1
secondary_cleaner.state.floatbank2_b_air                23
secondary_cleaner.state.floatbank2_b_level               1
secondary_cleaner.state.floatbank3_a_air                13
secondary_cleaner.state.floatbank3_a_level               1
secondary_cleaner.state.floatbank3_b_air                 1
secondary_cleaner.state.floatbank3_b_level               1
secondary_cleaner.state.floatbank4_a_air                 6
secondary_cleaner.state.floatbank4_a_level               1
secondary_cleaner.state.floatbank4_b_air                 1
secondary_cleaner.state.floatbank4_b_level               1
secondary_cleaner.state.floatbank5_a_air                 1
secondary_cleaner.state.floatbank5_a_level               1
secondary_cleaner.state.floatbank5_b_air                 1
secondary_cleaner.state.floatbank5_b_level               1
secondary_cleaner.state.floatbank6_a_air                 2
secondary_cleaner.state.floatbank6_a_level               1
dtype: int64
In [22]:
gold_full.describe()
Out[22]:
final.output.concentrate_ag final.output.concentrate_pb final.output.concentrate_sol final.output.concentrate_au final.output.recovery final.output.tail_ag final.output.tail_pb final.output.tail_sol final.output.tail_au primary_cleaner.input.sulfate primary_cleaner.input.depressant primary_cleaner.input.feed_size primary_cleaner.input.xanthate primary_cleaner.output.concentrate_ag primary_cleaner.output.concentrate_pb primary_cleaner.output.concentrate_sol primary_cleaner.output.concentrate_au primary_cleaner.output.tail_ag primary_cleaner.output.tail_pb primary_cleaner.output.tail_sol primary_cleaner.output.tail_au primary_cleaner.state.floatbank8_a_air primary_cleaner.state.floatbank8_a_level primary_cleaner.state.floatbank8_b_air primary_cleaner.state.floatbank8_b_level primary_cleaner.state.floatbank8_c_air primary_cleaner.state.floatbank8_c_level primary_cleaner.state.floatbank8_d_air primary_cleaner.state.floatbank8_d_level rougher.calculation.sulfate_to_au_concentrate rougher.calculation.floatbank10_sulfate_to_au_feed rougher.calculation.floatbank11_sulfate_to_au_feed rougher.calculation.au_pb_ratio rougher.input.feed_ag rougher.input.feed_pb rougher.input.feed_rate rougher.input.feed_size rougher.input.feed_sol rougher.input.feed_au rougher.input.floatbank10_sulfate rougher.input.floatbank10_xanthate rougher.input.floatbank11_sulfate rougher.input.floatbank11_xanthate rougher.output.concentrate_ag rougher.output.concentrate_pb rougher.output.concentrate_sol rougher.output.concentrate_au rougher.output.recovery rougher.output.tail_ag rougher.output.tail_pb rougher.output.tail_sol rougher.output.tail_au rougher.state.floatbank10_a_air rougher.state.floatbank10_a_level rougher.state.floatbank10_b_air rougher.state.floatbank10_b_level rougher.state.floatbank10_c_air rougher.state.floatbank10_c_level rougher.state.floatbank10_d_air rougher.state.floatbank10_d_level rougher.state.floatbank10_e_air rougher.state.floatbank10_e_level rougher.state.floatbank10_f_air rougher.state.floatbank10_f_level secondary_cleaner.output.tail_ag secondary_cleaner.output.tail_pb secondary_cleaner.output.tail_sol secondary_cleaner.output.tail_au secondary_cleaner.state.floatbank2_a_air secondary_cleaner.state.floatbank2_a_level secondary_cleaner.state.floatbank2_b_air secondary_cleaner.state.floatbank2_b_level secondary_cleaner.state.floatbank3_a_air secondary_cleaner.state.floatbank3_a_level secondary_cleaner.state.floatbank3_b_air secondary_cleaner.state.floatbank3_b_level secondary_cleaner.state.floatbank4_a_air secondary_cleaner.state.floatbank4_a_level secondary_cleaner.state.floatbank4_b_air secondary_cleaner.state.floatbank4_b_level secondary_cleaner.state.floatbank5_a_air secondary_cleaner.state.floatbank5_a_level secondary_cleaner.state.floatbank5_b_air secondary_cleaner.state.floatbank5_b_level secondary_cleaner.state.floatbank6_a_air secondary_cleaner.state.floatbank6_a_level
count 19,438.00 19,438.00 19,228.00 19,439.00 19,439.00 19,438.00 19,338.00 19,433.00 19,439.00 19,415.00 19,402.00 19,439.00 19,335.00 19,439.00 19,323.00 19,069.00 19,439.00 19,435.00 19,418.00 19,377.00 19,439.00 19,435.00 19,438.00 19,435.00 19,438.00 19,437.00 19,438.00 19,436.00 19,438.00 19,437.00 19,437.00 19,437.00 19,439.00 19,439.00 19,339.00 19,428.00 19,294.00 19,340.00 19,439.00 19,405.00 19,431.00 19,395.00 18,986.00 19,439.00 19,439.00 19,416.00 19,439.00 19,439.00 19,438.00 19,439.00 19,439.00 19,439.00 19,438.00 19,438.00 19,438.00 19,438.00 19,438.00 19,438.00 19,439.00 19,439.00 19,003.00 19,439.00 19,439.00 19,439.00 19,437.00 19,427.00 17,691.00 19,439.00 19,219.00 19,438.00 19,416.00 19,438.00 19,426.00 19,438.00 19,438.00 19,438.00 19,433.00 19,438.00 19,438.00 19,438.00 19,438.00 19,438.00 19,438.00 19,438.00 19,437.00 19,438.00
mean 5.17 9.98 9.50 44.08 67.05 9.69 2.71 10.58 3.04 144.62 8.82 7.31 1.02 8.44 9.83 10.49 32.12 16.15 3.44 7.97 3.91 1,589.35 -491.20 1,591.34 -492.19 1,586.67 -491.98 1,542.91 -488.02 42,171.19 3,393.05 3,256.85 2.42 8.79 3.60 478.32 58.97 36.70 8.27 12.33 6.02 12.06 6.07 11.99 7.61 28.81 19.77 83.33 5.59 0.65 18.06 1.82 1,105.32 -376.61 1,320.22 -467.25 1,299.97 -467.72 1,211.56 -468.24 1,080.87 -466.04 1,025.27 -466.27 14.59 5.78 7.17 4.34 28.71 -502.37 24.11 -503.23 28.20 -486.20 22.20 -493.74 18.99 -485.89 15.01 -461.08 15.69 -488.68 12.20 -487.15 18.96 -505.44
std 1.37 1.67 2.79 5.13 10.13 2.33 0.95 2.87 0.92 44.46 3.29 0.61 0.51 2.05 2.56 3.91 5.63 3.55 1.39 2.21 1.59 129.00 32.43 131.81 33.79 136.11 32.71 246.20 43.96 324,362.11 4,943.85 5,781.42 0.81 1.94 1.05 105.37 21.63 5.15 1.96 3.43 1.13 3.78 1.10 2.73 1.80 5.94 3.75 14.15 1.11 0.25 3.45 0.68 160.83 93.91 176.29 54.67 202.80 53.53 204.02 53.92 172.15 57.38 164.49 57.85 4.27 2.77 3.92 2.33 5.46 53.57 5.75 56.57 6.13 54.41 5.57 43.37 5.41 47.76 4.89 67.41 5.51 34.53 5.33 38.35 5.55 37.69
min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -798.64 0.00 -800.00 0.00 -799.96 0.00 -799.79 -42,235,197.37 -486.60 -264.98 -0.01 0.01 0.01 0.00 0.05 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.59 0.01 10.01 0.02 -0.04 -657.95 -0.72 -650.26 -0.06 -647.54 -0.99 -648.39 -1.98 -649.44 -2.59 -649.95 0.00 0.00 0.00 0.00 0.08 -799.61 0.00 -799.87 0.00 -799.76 0.00 -809.33 0.00 -799.80 0.00 -800.84 -0.37 -797.32 0.53 -800.22 -0.08 -809.74
25% 4.25 9.14 7.72 43.40 63.30 8.06 2.04 8.94 2.46 114.11 6.08 6.94 0.70 7.26 8.70 7.89 30.46 14.02 2.50 6.75 2.92 1,505.06 -500.31 1,503.63 -500.44 1,500.93 -500.70 1,494.22 -500.46 40,700.95 2,611.43 2,551.44 2.04 7.24 2.89 416.53 47.44 34.21 6.85 10.00 5.40 10.00 5.40 10.63 6.73 27.22 18.80 81.04 4.89 0.48 15.82 1.40 999.66 -499.84 1,199.63 -500.20 1,151.08 -500.22 1,061.05 -500.39 998.68 -500.23 901.00 -500.52 12.64 3.86 3.85 3.18 25.05 -500.23 20.92 -500.25 24.95 -500.21 19.00 -500.12 14.98 -500.70 11.94 -500.19 10.99 -500.46 8.97 -500.13 14.98 -500.73
50% 5.07 10.10 9.22 45.01 68.17 9.74 2.75 10.62 2.98 143.23 8.04 7.28 0.94 8.51 10.15 10.31 32.84 15.84 3.27 8.18 3.62 1,600.60 -499.93 1,600.52 -499.96 1,600.25 -499.91 1,599.45 -499.91 44,696.48 3,018.35 2,997.22 2.31 8.59 3.53 499.42 54.61 37.20 8.13 12.00 6.01 12.00 6.10 12.10 7.74 29.84 20.28 86.19 5.77 0.62 18.11 1.81 1,001.07 -300.26 1,301.58 -499.78 1,300.21 -499.71 1,201.57 -499.53 1,050.03 -499.66 999.92 -499.47 15.64 5.34 7.69 4.07 29.10 -499.97 25.04 -500.02 27.98 -499.93 22.01 -499.98 18.02 -499.84 14.97 -499.46 15.00 -499.80 11.02 -499.94 19.96 -500.05
75% 5.90 11.04 10.95 46.28 72.69 11.13 3.33 12.10 3.57 175.08 11.01 7.67 1.21 9.81 11.42 13.45 35.05 18.01 4.18 9.55 4.59 1,697.66 -499.43 1,699.22 -499.37 1,699.48 -498.86 1,698.52 -499.06 48,168.21 3,676.77 3,602.00 2.73 10.21 4.24 550.17 65.02 40.04 9.77 14.72 6.80 14.64 6.80 13.74 8.57 32.19 21.72 90.01 6.39 0.78 20.09 2.21 1,205.62 -299.98 1,448.63 -400.62 1,449.46 -400.99 1,352.88 -401.64 1,199.43 -401.16 1,099.72 -401.51 17.36 7.79 10.41 5.06 33.01 -499.67 28.01 -499.79 33.00 -499.33 26.00 -499.81 23.01 -498.25 19.03 -400.12 18.03 -498.38 14.02 -499.44 24.00 -499.50
max 16.00 17.03 19.62 52.76 100.00 19.55 5.80 22.32 8.25 265.98 40.00 15.50 4.10 16.08 17.08 22.46 45.93 29.46 9.63 22.28 17.79 2,103.10 -57.20 2,114.91 -142.53 2,013.16 -150.94 2,398.90 -30.60 3,428,098.94 629,638.98 718,684.96 39.38 14.60 7.14 717.51 484.97 53.48 13.73 36.12 9.70 37.98 9.70 24.48 13.62 38.35 28.82 100.00 12.72 3.78 66.12 9.69 1,521.98 -273.78 1,809.19 -296.38 2,499.13 -292.16 1,817.20 -76.40 1,922.64 -139.75 1,706.31 -191.72 23.26 17.04 26.00 26.81 52.65 -127.88 35.15 -212.00 44.26 -191.68 35.07 -159.74 30.12 -245.24 31.27 -6.51 43.71 -244.48 27.93 -137.74 32.19 -104.43
In [23]:
# Отдельно визуализируем распределение целевых признаков 'rougher.output.recovery' и 'final.output.recovery' 

gold_full['rougher.output.recovery'].hist(figsize=(16, 5), alpha=0.7, bins=100, color='black', edgecolor = 'black')
gold_full['final.output.recovery'].hist(figsize=(16, 5), alpha=0.7, bins=100, color='gold', edgecolor = 'black')
plt.grid(True)
plt.legend(["Эффективность обогащения чернового концентрата 'rougher.output.recovery'", 
            "Эффективность обогащения финального концентрата 'final.output.recovery'"])
plt.xlabel('Эффективность (коэффициент)')
plt.ylabel('Количество индексаций (замеров)')
plt.title('Распределение эффективности обогащения чернового и финального концентрата в обучающей выборке')
plt.show()

Комментарий:

В данных обнаружено незначительное количество пропусков. За исключением столбца date, все признаки имеют тип float. Дубликатов не обнаружено. Распределение признаков разнообразно, все они, кроме столбца date, являются количественными. Целевые признаки имеют отрицательную асимметрию и не следуют какому-либо стандартному закону.

Корреляционный анализ показал, что между некоторыми признаками существует сильная зависимость, связанная с концентрациями металлов в руде, применением реагентов и последовательными технологическими процессами. Задача предсказания целевых признаков rougher.output.recovery и final.output.recovery - это задача регрессии.

Проведем проверку правильности рассчета эффективности обогащения и вычислим её на обучающей выборке для признака rougher.output.recovery. Затем мы найдем среднюю абсолютную погрешность (MAE) между нашими расчетами и реальными значениями признака на обучающей выборке и сделаем выводы на основе полученных результатов.

C — доля золота в концентрате после флотации/очистки (rougher.output.concentrate_au);
F — доля золота в сырье/концентрате до флотации/очистки (rougher.input.feed_au);
T — доля золота в отвальных хвостах после флотации/очистки (rougher.output.tail_au).

$$ recovery = {{С\cdot{(F-T)} \over F\cdot{(C-T)}}} \cdot 100% $$

Среднее абсолютное отклонение MAE для линейной регрессии в общем виде рассчитывается так:
mae = mean_absolute_error(target_valid, predicted_valid)

Проверим корректность расчета параметра recovery

In [24]:
def apply_recovery(row):
    (input_au,
     output_au,
     output_tail) = (row["rougher.input.feed_au"],
                     row["rougher.output.concentrate_au"],
                     row["rougher.output.tail_au"])
    recovery_metric = (((output_au) * ((input_au) - (output_tail)))/
                       (((input_au) * ((output_au) - (output_tail))))) * 100
    return recovery_metric
In [25]:
test = gold_train.dropna(subset = ["rougher.input.feed_au",
                                    "rougher.output.concentrate_au",
                                    "rougher.output.tail_au",
                                    "rougher.output.recovery"],axis = 0).apply(
                                                                        apply_recovery,axis = 1)
In [26]:
mean_absolute_error(gold_train.dropna(subset = ["rougher.input.feed_au",
                                    "rougher.output.concentrate_au",
                                    "rougher.output.tail_au",
                                    "rougher.output.recovery"])["rougher.output.recovery"],
                       test)
Out[26]:
9.73512347450521e-15

Комментарий:

MAE как в train, так и в full выборках крайне мала (e-15 степень) и не равна нулю только из-за возможного округления или особенностей расчета стандартными библиотеками sc-learn, следовательно, данным расхождением можно пренебречь и сделать вывод, что как в train, так и в full выборках расчетное значение (полученное нами) и значение в самих датафреймах равны, данные корректны в этой части.

Найдем все столбцы, которые есть в тренировочной выборке, но нет в тестовой

In [27]:
# С использованием list comprehension выведем признаки, которые отсутствуют в тестовой выборке (gold_recovery_test_new)
# difference = [x for x in gold_train.columns if x not in gold_test.columns]
# difference
In [28]:
# Выведем признаки, которые отсутствуют в тестовой выборке (gold_recovery_test_new):
gold_train_columns_remains = set(gold_train.columns) - set(gold_test.columns)
difference = gold_train[list(gold_train_columns_remains)]
display(list(difference))
['rougher.output.recovery',
 'final.output.concentrate_sol',
 'final.output.tail_pb',
 'rougher.output.tail_sol',
 'primary_cleaner.output.tail_pb',
 'rougher.output.tail_au',
 'final.output.tail_ag',
 'final.output.recovery',
 'secondary_cleaner.output.tail_sol',
 'rougher.output.tail_ag',
 'final.output.concentrate_pb',
 'rougher.calculation.floatbank10_sulfate_to_au_feed',
 'primary_cleaner.output.tail_ag',
 'rougher.output.tail_pb',
 'rougher.output.concentrate_ag',
 'final.output.tail_au',
 'rougher.calculation.sulfate_to_au_concentrate',
 'primary_cleaner.output.tail_sol',
 'primary_cleaner.output.concentrate_pb',
 'secondary_cleaner.output.tail_ag',
 'secondary_cleaner.output.tail_au',
 'rougher.calculation.au_pb_ratio',
 'primary_cleaner.output.concentrate_ag',
 'rougher.calculation.floatbank11_sulfate_to_au_feed',
 'rougher.output.concentrate_au',
 'final.output.tail_sol',
 'primary_cleaner.output.concentrate_sol',
 'secondary_cleaner.output.tail_pb',
 'rougher.output.concentrate_sol',
 'final.output.concentrate_ag',
 'rougher.output.concentrate_pb',
 'primary_cleaner.output.concentrate_au',
 'primary_cleaner.output.tail_au',
 'final.output.concentrate_au']

Комментарий:

Наименование признаков имеет такой вид:

[этап].[тип_параметра].[название_параметра].

Признаки, недоступные в тестовой выборке, имеют следующие типы и параметры (они также указаны в пункте "Дополнение" выше):

output.concentrate - концентрация металлов (золота au, серебра ag, свинца pb) и растворителя/коллоида (sol, solvent aka) в продукте (output) на различных этапах очистки;
output.tail - отвальные хвосты (tail) продукта (output) на различных этапах очистки;
calculation — расчётные характеристики:

  • rougher.calculation.au_pb_ratio - расчётные характеристики соотношения золота (au) и свинца (pb) в исходном сырье (rougher);
  • rougher.calculation.floatbank10_sulfate_to_au_feed и rougher.calculation.floatbank11_sulfate_to_au_feed - расчётные характеристики подачи (feed) сульфата (sulfate) и золота (au) в исходном сырье (rougher) во флотационной установке (floatbank) (10-й и 11-й этапы);
  • rougher.calculation.sulfate_to_au_concentrate - расчётные характеристики концентраций (concentrate) сульфата (sulfate) и золота (au) в исходном сырье (rougher).

Целевые признаки:

rougher.output.recovery - эффективность обогащения чернового концентрата;
final.output.recovery - эффективность обогащения финального концентрата.

Проверим возможность восстановления параметра recovery

In [29]:
gold_train = gold_train.dropna(subset=["rougher.output.recovery","final.output.recovery"],axis = 0)

Обработаем пропуски

In [30]:
#imputer = KNNImputer()
#without_nan = pd.DataFrame(data = imputer.fit_transform(gold_train.drop(['date'],axis = 1)),
             #columns= gold_train.drop(['date'],axis = 1).columns,
             #index = gold_train.index)
In [31]:
without_nan = gold_train.drop('date', axis=1).ffill(axis=0)
test_without_nan = gold_test.drop('date', axis=1).ffill(axis=0)
In [32]:
gold_test = gold_test.dropna(subset= ['date'],axis = 0)
gold_test_for_target = gold_test.merge(gold_full.loc[:,['date',"rougher.output.recovery",
                                                           "final.output.recovery"]],on = 'date')
gold_test_for_target = gold_test_for_target.dropna(subset=["rougher.output.recovery",
                                                             "final.output.recovery"],axis = 0)
gold_test = gold_test_for_target.loc[:,gold_test.columns]
target_test_rougher = gold_test_for_target["rougher.output.recovery"]
target_test_final = gold_test_for_target["final.output.recovery"]
gold_test = gold_test.drop("date",axis =1)
In [33]:
#test_without_nan = pd.DataFrame(data=imputer.transform(gold_test.drop(['date'],axis = 1)),
                                #columns=gold_test.drop(['date'],axis = 1).columns,
                                #index=gold_test.index)

Промежуточный вывод:

  1. Файл открыт, данные загружены.
  2. Датафреймы содержали значительный процент пропущенных значений, но пропуски были обработаны.
  3. Датафреймы не содержат дубликатов, типы данных не требуют преобразования.
  4. Обучающая выборка (df_train) содержала лишние признаки, а именно столбцы в df_train, которых нет в df_test, т.к. это признаки, которые не могут быть посчитаны до завершения технологического процесса. Данные столбцы были удалены.
  5. По всем датафреймам на хит-мэпах видны области высокой корреляции признаков. На обучающей и тестовой выборках они зрительно совпадают, что может условно свидетельствовать об однородности этих баз. В полной базе большее количество признаков, но в целом рисунок зрительно совпадает с обучающей и тестовой выборками.
  6. Предобработка данных завершена.

Анализ данных¶

Посмотрим, как меняется концентрация металлов (Au, Ag, Pb) на различных этапах очистки. Опишем выводы.¶

Концентрация металлов на различных этапах очистки:

rougher.input.feed_ — в сырье;
rougher.output.concentrate_ — в черновом концентрате;
primary_cleaner.output.concentrate_ - в концентрате после первичной очистки;
final.output.concentrate_ — в финальном концентрате.

In [34]:
print("Концентрация до флотации")
print("Концентрация Серебра:{: 0.2f},Свинца:{: 0.2f},Золота:{: 0.2f}".
format(without_nan["rougher.input.feed_ag"].mean(),
without_nan["rougher.input.feed_pb"].mean(),
without_nan["rougher.input.feed_au"].mean()))
print()
print("Концентрация после флотации")
print("Коцентрация Серебра:{: 0.2f},Свинца:{: 0.2f},Золота:{: 0.2f}".
format(without_nan["rougher.output.concentrate_ag"].mean(),
without_nan["rougher.output.concentrate_pb"].mean(),
without_nan["rougher.output.concentrate_au"].mean()))
print()
print("Концентрация после первичной очистки")
print("Концентрация Серебра:{: 0.2f},Свинца:{: 0.2f},Золота:{: 0.2f}".
format(without_nan["primary_cleaner.output.concentrate_ag"].mean(),
without_nan["primary_cleaner.output.concentrate_pb"].mean(),
without_nan["primary_cleaner.output.concentrate_au"].mean()))
print()
print("Концентрация поле вторичной очистки")
print("Концентрация Серебра:{: 0.2f},Свинца:{: 0.2f},Золота:{: 0.2f}".
format(without_nan["final.output.concentrate_ag"].mean(),
without_nan["final.output.concentrate_pb"].mean(),
without_nan["final.output.concentrate_au"].mean()))
Концентрация до флотации
Концентрация Серебра: 8.58,Свинца: 3.51,Золота: 7.87

Концентрация после флотации
Коцентрация Серебра: 11.78,Свинца: 7.66,Золота: 19.44

Концентрация после первичной очистки
Концентрация Серебра: 8.20,Свинца: 9.57,Золота: 32.39

Концентрация поле вторичной очистки
Концентрация Серебра: 5.14,Свинца: 10.13,Золота: 44.00

Построим гистограммы для распределения концентраций всех металлов на различных этапах очистки: в сырье, в черновом и финальном концентратах в обучающей выборке (датасет gold_recovery_train_new).

In [35]:
metals = ['au', 'ag', 'pb']
stages = [
    'rougher.input.feed_',
    'rougher.output.concentrate_',
    'primary_cleaner.output.concentrate_',
    'final.output.concentrate_',
]

print('\033[1m' + 'Максимальные значения концентраций металлов в сырье, в черновом, после первичной очистки'
      'и финальном концентратах:' + '\033[0m')
print()
    
for metal in metals:
    max_concentration = 0
    plt.figure(figsize=(16, 5))
    plt.grid(True)
    plt.xlabel('Концентрация')
    plt.ylabel('Количество индексаций (замеров)')
    plt.title(f'Распределение концентрации {metal} в сырье, в черновом, после первичной очистки и финальном концентратах')
    for stage in stages:
        concentration = gold_train[f'{stage}{metal}']
        plt.hist(concentration, bins=80, alpha=0.5, edgecolor = 'black')
        plt.legend(['в сырье', 'в черновом концентрате', 'в концентрате после первичной очистки', 'в финальном концентрате'])
        max_concentraition = max(max_concentration, int(concentration.max()))
        print(f'{stage}{metal}:', max_concentraition)
Максимальные значения концентраций металлов в сырье, в черновом, после первичной очисткии финальном концентратах:

rougher.input.feed_au: 13
rougher.output.concentrate_au: 28
primary_cleaner.output.concentrate_au: 45
final.output.concentrate_au: 52
rougher.input.feed_ag: 14
rougher.output.concentrate_ag: 24
primary_cleaner.output.concentrate_ag: 16
final.output.concentrate_ag: 16
rougher.input.feed_pb: 7
rougher.output.concentrate_pb: 13
primary_cleaner.output.concentrate_pb: 17
final.output.concentrate_pb: 17

Комментарий:

По представленным выше данным отчетливо видно, что с каждым этапом увеличивается концентрация золата и свинца, так же можно заметить уменьшение концентрации серебра

In [36]:
for frame,name in zip([without_nan,test_without_nan],
                       ["train","test"]):
    subset = frame["rougher.input.feed_size"]

    sns.distplot(subset, hist = False, kde = True,
                 label = name)
plt.legend(prop={'size': 10}, title = 'Frame')
plt.title('Размер гранул')
Out[36]:
Text(0.5, 1.0, 'Размер гранул')

Комментарий:

Мы можем видеть, что распределения немного отличаются. В тренировочной выборке преобладают значения около порога 50 и выше, в то же время как в тестовой выборке достаточно большое количество объектов сконцентрированы до порога 50. Так же это заметно и на средних значениях, среднее значение тренировочной выборки на 3 пункта больше среднего по тестовой выборке

Расчет суммарной концентрации

In [37]:
def summary_of_concentarution(row):
    list_of_steps = ["rougher","primary_cleaner","final"]
    input_feed_au = row["rougher.input.feed_au"]
    input_feed_ag = row["rougher.input.feed_ag"]
    input_feed_pb = row["rougher.input.feed_pb"]
    input_feed_sol = row["rougher.input.feed_sol"]
    out_rougher = []
    out_primary_cleaner = []
    out_final = []
    list_of_arrays =[out_rougher,out_primary_cleaner,out_final]
    for step,array in zip(list_of_steps,list_of_arrays):
        array.append(row[step+".output.concentrate_au"])
        array.append(row[step+".output.concentrate_ag"])
        array.append(row[step+".output.concentrate_pb"])
        array.append(row[step+".output.concentrate_sol"])
    sum_before_steps = input_feed_ag+input_feed_au+input_feed_pb+input_feed_sol
    sum_rougher = sum(out_rougher)
    sum_primary = sum(out_primary_cleaner)
    sum_final = sum(out_final)
    return pd.Series([sum_before_steps,sum_rougher,sum_primary,sum_final])
In [38]:
sum_values = without_nan.apply(summary_of_concentarution,axis = 1)
sum_values.columns = ["before_rougher","rougher","primary_cleaner","final"]
sum_values.head(10)
Out[38]:
before_rougher rougher primary_cleaner final
0 51.68 66.42 72.64 63.64
1 50.66 67.01 72.54 63.96
2 50.61 66.10 72.10 64.31
3 51.06 65.75 59.96 63.57
4 47.86 65.91 71.32 64.00
5 48.84 64.96 70.61 63.65
6 49.12 65.37 71.17 63.16
7 50.79 65.18 71.53 62.91
8 50.55 65.63 72.07 64.19
9 51.94 65.41 71.89 64.19
In [39]:
fig,ax = plt.subplots(4,1,figsize = (20,30))

ax[0].hist(sum_values["before_rougher"],bins = 40)
ax[1].hist(sum_values["rougher"],bins = 40)
ax[2].hist(sum_values["primary_cleaner"],bins = 40)
ax[3].hist(sum_values["final"],bins = 40)
ax[0].set_xlabel("Сумма долей компонентов")
ax[1].set_xlabel("Сумма долей компонентов")
ax[2].set_xlabel("Сумма долей компонентов")
ax[3].set_xlabel("Сумма долей компонентов")
ax[0].set_title("Концентрация компонентов до флотации")
ax[1].set_title("Концентрация компонентов после флотации")
ax[2].set_title("Концентрация компонентов после первичной очистки")
ax[3].set_title("Концентрация компонентов после финального этапа")
plt.show()

Можно заметить, что во всех распределениях присутствует пик около нулевого значения. Он очень далек от общего распределения, его следует удалить. Так же это стоит проделать и в тестовой выборке, так они носят характер выбросов

In [40]:
# Удалим выбросы
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.input.feed_au"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.input.feed_ag"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.input.feed_pb"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.input.feed_sol"] < 1].index,
                               axis = 0)


without_nan = without_nan.drop(index = without_nan[without_nan["rougher.output.concentrate_au"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.output.concentrate_ag"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.output.concentrate_pb"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["rougher.output.concentrate_sol"] < 1].index,
                               axis = 0)

without_nan = without_nan.drop(index = without_nan[without_nan["primary_cleaner.output.concentrate_au"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["primary_cleaner.output.concentrate_ag"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["primary_cleaner.output.concentrate_pb"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["primary_cleaner.output.concentrate_sol"] < 1].index,
                               axis = 0)

without_nan = without_nan.drop(index = without_nan[without_nan["final.output.concentrate_au"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["final.output.concentrate_ag"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["final.output.concentrate_pb"] < 1].index,
                               axis = 0)
without_nan = without_nan.drop(index = without_nan[without_nan["final.output.concentrate_sol"] < 1].index,
                               axis = 0)

График суммарной концентрации после удаления выбросов

In [41]:
fig,ax = plt.subplots(4,1,figsize = (20,30))

ax[0].hist(sum_values["before_rougher"],bins = 40)
ax[1].hist(sum_values["rougher"],bins = 40)
ax[2].hist(sum_values["primary_cleaner"],bins = 40)
ax[3].hist(sum_values["final"],bins = 40)
ax[0].set_xlabel("Сумма долей компонентов")
ax[1].set_xlabel("Сумма долей компонентов")
ax[2].set_xlabel("Сумма долей компонентов")
ax[3].set_xlabel("Сумма долей компонентов")
ax[0].set_title("Концентрация компонентов до флотации")
ax[1].set_title("Концентрация компонентов после флотации")
ax[2].set_title("Концентрация компонентов после первичной очистки")
ax[3].set_title("Концентрация компонентов после финального этапа")
plt.show()
In [42]:
# gold_test = gold_test.drop(index = gold_test[gold_test["rougher.input.feed_au"] < 1].index,
                               #axis = 0)
# gold_test = gold_test.drop(index = gold_test[gold_test["rougher.input.feed_ag"] < 1].index,
                               #axis = 0)
# gold_test = gold_test.drop(index = gold_test[gold_test["rougher.input.feed_pb"] < 1].index,
                               #axis = 0)
# gold_test = gold_test.drop(index = gold_test[gold_test["rougher.input.feed_sol"] < 1].index,
                               #axis = 0)
In [43]:
train_features_dataset_rougher = without_nan.loc[:,gold_test.columns]
train_target_dataset_rougher = without_nan['rougher.output.recovery']
train_features_dataset_final = train_features_dataset_rougher 
train_target_dataset_final = without_nan['final.output.recovery']

Модель¶

In [44]:
def smape_function(y_true, y_pred):
    error = np.abs(y_true - y_pred)
    scale = (np.abs(y_true) + np.abs(y_pred)) / 2
    return np.mean(error / scale) * 100
In [45]:
def final_smape_function(rougher, final):
    score_final = 0.25 * rougher + 0.75 * final
    return score_final
In [46]:
smape_scorer = make_scorer(smape_function, greater_is_better=False)

Инициализация pipeline

In [47]:
imputer = KNNImputer()
model = make_pipeline(imputer, StandardScaler(), RandomForestRegressor())
model2 = make_pipeline(imputer, StandardScaler(), DecisionTreeRegressor())

⚠️
Стандартизация не важна для деревьев, но на самом деле на простой линейной регрессии стандартизация также не окажет никакого влияния, поскольку все изменения переменной можно нивелировать за счет изменения коэффициента:

$y = \alpha + \beta X$ - без стандартизации

$y = \alpha_{st} + \beta_{st} \frac{X-mean}{std} = (\alpha_{st} - \frac{\beta_{st}mean}{std}) + (\frac{\beta_{st}}{std}) X$ - со стандартизацией

При этом $\alpha = (\alpha_{st} - \frac{\beta_{st}mean}{std})$ и $\beta = (\frac{\beta_{st}}{std})$.

По ссылкам можно узнать, когда все-таки стандартизация крайне важна: тут и тут.

In [48]:
# Инициализация параметров для GridSearchCV
params_RF = {"randomforestregressor__n_estimators":[5,100],
             "randomforestregressor__max_depth":[1,10]}
params_DT = {"decisiontreeregressor__max_depth":[1,10]}
In [49]:
grid_rougher = GridSearchCV(model,param_grid = params_RF,scoring=smape_scorer)
grid_final = GridSearchCV(model,param_grid = params_RF,scoring=smape_scorer)
grid_rougher_DT = GridSearchCV(model2,param_grid = params_DT,scoring=smape_scorer)
grid_final_DT = GridSearchCV(model2,param_grid = params_DT,scoring=smape_scorer)

Обучим разные модели и оценим их качество кросс-валидацией.

In [50]:
grid_rougher.fit(train_features_dataset_rougher,train_target_dataset_rougher)
grid_final.fit(train_features_dataset_final,train_target_dataset_final)
grid_rougher_DT.fit(train_features_dataset_rougher,train_target_dataset_rougher)
grid_final_DT.fit(train_features_dataset_final,train_target_dataset_final)
Out[50]:
GridSearchCV(estimator=Pipeline(steps=[('knnimputer', KNNImputer()),
                                       ('standardscaler', StandardScaler()),
                                       ('decisiontreeregressor',
                                        DecisionTreeRegressor())]),
             param_grid={'decisiontreeregressor__max_depth': [1, 10]},
             scoring=make_scorer(smape_function, greater_is_better=False))
In [51]:
print("Лучшие модели")
print(grid_rougher.best_estimator_)
print(grid_rougher.best_estimator_)
print(grid_rougher_DT.best_estimator_)
print(grid_final_DT.best_estimator_)
Лучшие модели
Pipeline(steps=[('knnimputer', KNNImputer()),
                ('standardscaler', StandardScaler()),
                ('randomforestregressor', RandomForestRegressor(max_depth=10))])
Pipeline(steps=[('knnimputer', KNNImputer()),
                ('standardscaler', StandardScaler()),
                ('randomforestregressor', RandomForestRegressor(max_depth=10))])
Pipeline(steps=[('knnimputer', KNNImputer()),
                ('standardscaler', StandardScaler()),
                ('decisiontreeregressor', DecisionTreeRegressor(max_depth=1))])
Pipeline(steps=[('knnimputer', KNNImputer()),
                ('standardscaler', StandardScaler()),
                ('decisiontreeregressor', DecisionTreeRegressor(max_depth=1))])
In [52]:
best_score_rougher = grid_rougher.best_score_
best_score_final = grid_final.best_score_
best_score_rougher_DT = grid_rougher_DT.best_score_
best_score_final_DT = grid_final_DT.best_score_

Определим лучшую модель для каждого целевого признака

In [53]:
print("rougher")
print("RandomForest",best_score_rougher,"DecisionTree",best_score_rougher_DT)
print("final")
print("RandomForest",best_score_final,"DecisionTree",best_score_final_DT)
rougher
RandomForest -6.951732156227116 DecisionTree -7.805160271844767
final
RandomForest -9.177172912877552 DecisionTree -9.200627976505208

Видно, что лучше всего справляется RandomForest, так что как лучшую модель возьмем именно его

Лучшие результаты на тренировочной выборке

Финальный sMAPE рандомного леса

In [54]:
final_smape_function(best_score_rougher,best_score_final)
Out[54]:
-8.620812723714943

Финальный sMAPE решающего дерева

In [55]:
final_smape_function(best_score_rougher_DT,best_score_final_DT)
Out[55]:
-8.851761050340098

Наименьшую ошибку мы получили при использовании рандомного леса

In [56]:
rogher_test = grid_rougher.predict(test_without_nan)
In [57]:
final_test = grid_final.predict(test_without_nan)

Финальная ошибка на тестовой выборке

In [58]:
final_smape_function(smape_function(target_test_rougher,rogher_test),smape_function(target_test_final,final_test))
Out[58]:
9.224474286678962

Инициализируем и обучим константную модель

In [59]:
base = DummyRegressor(strategy="median")
base_final = DummyRegressor(strategy="median")
In [60]:
base.fit(train_features_dataset_rougher,train_target_dataset_rougher)
base_final.fit(train_features_dataset_final,train_target_dataset_final)
Out[60]:
DummyRegressor(strategy='median')

Предскажем метки

In [61]:
base_rougher = base.predict(test_without_nan)
base_final_pred = base_final.predict(test_without_nan)
In [62]:
final_smape_function(smape_function(target_test_rougher,base_rougher),smape_function(target_test_final,base_final_pred))
Out[62]:
9.41968128927604

Промежуточный вывод:

В результате выполнения задач этого раздела мы построили модель:

  1. Написали функцию для вычисления итоговой sMAPE;
  2. Обучили разные модели и оценили их качество кросс-валидацией;
  3. Выбрали лучшую модель и проверили её на тестовой выборке.

Вывод¶

  • Итоговая модель построена на алгоритме RandomForest
  • Ошибка на тестовой выборке: 9.15
  • Параметры алгоритма: { max_depth: 10}
  • Параметр recovery в тренировочной выборке был рассчитан верно
  • Изначально, в тестовой выборке отсутствует 34 признака

Чек-лист готовности проекта¶

- [x] Jupyter Notebook открыт¶

  • Весь код выполняется без ошибок
  • Ячейки с кодом расположены в порядке выполнения
  • Выполнен шаг 1: данные подготовлены
    • Проверена формула вычисления эффективности обогащения
    • Проанализированы признаки, недоступные в тестовой выборке
    • Проведена предобработка данных
  • Выполнен шаг 2: данные проанализированы
    • Исследовано изменение концентрации элементов на каждом этапе
    • Проанализированы распределения размеров гранул на обучающей и тестовой выборках
    • Исследованы суммарные концентрации
  • Выполнен шаг 3: построена модель прогнозирования
    • Написана функция для вычисления итогового sMAPE
    • Обучено и проверено несколько моделей
    • Выбрана лучшая модель, её качество проверено на тестовой выборке